VTT Loader Node
The VTT Loader node retrieves WebVTT subtitle files from URLs and parses them into structured caption objects with timing information. It automatically converts VTT timestamps to milliseconds for programmatic processing. The parsed captions are returned as an array of objects containing start time, end time, and text content.
How It Works
When the node executes, it downloads the VTT file from the specified URL, parses the WebVTT format to extract individual captions with their timing information, and converts timestamps from the standard VTT time format (HH:MM:SS.mmm) into milliseconds. Each caption in the VTT file becomes a separate object in the output array, preserving sequential order and timing information from the original file.
The node supports both simple timestamps (MM:SS.mmm) and full timestamps (HH:MM:SS.mmm), converting both formats to millisecond precision for consistency. Downloaded files are temporarily stored during processing and automatically cleaned up after parsing completes.
The output is an array of caption objects where each object contains three properties: startTime (integer in milliseconds), endTime (integer in milliseconds), and text (string content). This structured format is compatible with various downstream operations such as text analysis, caption editing, timestamp manipulation, or conversion to other subtitle formats.
Configuration Parameters
Input Field
Input Field (Text, Required): Workflow variable containing the VTT URL.
The URL must start with http:// or https:// and point to a valid VTT file. Variable interpolation using ${variable_name} syntax supports dynamic URL construction. The VTT file should follow the WebVTT specification with properly formatted timestamps and caption text.
Common patterns: https://storage.example.com/subtitles/video123.vtt, ${subtitle_url}, https://cdn.example.com/captions/${video_id}.vtt.
Output Field
Output Field (Text, Required): Workflow variable where parsed caption objects are stored.
The output is an array of caption objects with three properties per object: startTime (milliseconds), endTime (milliseconds), and text (string content). The array preserves sequential order from the VTT file.
Example output structure:
[
{"startTime": 0, "endTime": 2500, "text": "Welcome to the video."},
{"startTime": 2500, "endTime": 5000, "text": "This is the second caption."}
]
Common naming patterns: vtt_captions, subtitle_data, caption_list, parsed_captions.
Common Parameters
This node supports common parameters shared across workflow nodes, including Stream Output Response, Streaming Messages, and Logging Mode. For detailed information, see Common Parameters.
Best Practices
- Validate VTT URLs are accessible and return valid WebVTT content before processing
- Implement error handling using conditional nodes to gracefully handle missing or malformed files
- Variable interpolation for dynamic URLs based on video IDs enables reusable workflows without hardcoding
- Verify VTT files contain expected caption structure and timing information before passing to downstream nodes
- For video synchronization, ensure timestamps match the video's actual timing to prevent caption misalignment
- Descriptive variable names like
video_subtitlesimprove workflow maintainability over generic names
Limitations
- URL-only support: The node only supports loading VTT files from URLs (HTTP/HTTPS). Local file paths are not supported.
- No authentication headers: Custom HTTP headers for authentication are not supported. Credentials must be included in the URL as query parameters, or the endpoint must be publicly accessible.
- Download timeout range: Download timeout is configurable between 10 seconds and 5 minutes (10,000-300,000ms). Very large VTT files or slow connections may exceed the maximum timeout.
- No format validation: The node does not validate WebVTT format compliance beyond basic parsing. Malformed VTT files may cause parsing errors or produce incomplete caption data.
- Timestamp precision: Timestamps are converted to milliseconds with millisecond precision. Sub-millisecond timing information is not preserved.
- No styling information: The node extracts only caption text and timing information. WebVTT styling, positioning, and formatting cues are not preserved.